Pesquisa | Portal Regional da BVS

Reinforcement Learning With Human Advice: A Survey.

Najar, Anis; Chetouani, Mohamed.

Front Robot AI ; 8: 584075, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34141726

RESUMO

In this paper, we provide an overview of the existing methods for integrating human advice into a reinforcement learning process. We first propose a taxonomy of the different forms of advice that can be provided to a learning agent. We then describe the methods that can be used for interpreting advice when its meaning is not determined beforehand. Finally, we review different approaches for integrating advice into the learning process.

The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning.

Najar, Anis; Bonnet, Emmanuelle; Bahrami, Bahador; Palminteri, Stefano.

PLoS Biol ; 18(12): e3001028, 2020 12.

Artigo em Inglês | MEDLINE | ID: mdl-33290387

RESUMO

While there is no doubt that social signals affect human reinforcement learning, there is still no consensus about how this process is computationally implemented. To address this issue, we compared three psychologically plausible hypotheses about the algorithmic implementation of imitation in reinforcement learning. The first hypothesis, decision biasing (DB), postulates that imitation consists in transiently biasing the learner's action selection without affecting their value function. According to the second hypothesis, model-based imitation (MB), the learner infers the demonstrator's value function through inverse reinforcement learning and uses it to bias action selection. Finally, according to the third hypothesis, value shaping (VS), the demonstrator's actions directly affect the learner's value function. We tested these three hypotheses in 2 experiments (N = 24 and N = 44) featuring a new variant of a social reinforcement learning task. We show through model comparison and model simulation that VS provides the best explanation of learner's behavior. Results replicated in a third independent experiment featuring a larger cohort and a different design (N = 302). In our experiments, we also manipulated the quality of the demonstrators' choices and found that learners were able to adapt their imitation rate, so that only skilled demonstrators were imitated. We proposed and tested an efficient meta-learning process to account for this effect, where imitation is regulated by the agreement between the learner and the demonstrator. In sum, our findings provide new insights and perspectives on the computational mechanisms underlying adaptive imitation in human reinforcement learning.

Assuntos

Comportamento Imitativo/fisiologia , Reforço Social , Aprendizado Social/fisiologia , Adulto , Feminino , Humanos , Aprendizagem/fisiologia , Masculino , Modelos Teóricos , Reforço Psicológico , Recompensa , Adulto Jovem

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA